机器人的视觉系统根据应用程序的要求不同:它可能需要高精度或可靠性,受到有限的资源的约束或需要快速适应动态变化的环境。在这项工作中,我们专注于实例分割任务,并对不同的技术进行了全面的研究,这些技术允许在存在新对象或不同域的存在下调整对象分割模型。我们为针对数据流入的机器人应用设计的快速实例细分学习提供了一条管道。它基于在预训练的CNN上利用的混合方法,用于特征提取和基于快速培训的基于内核的分类器。我们还提出了一种培训协议,该协议可以通过在数据采集期间执行特征提取来缩短培训时间。我们在两个机器人数据集上基准了提议的管道,然后将其部署在一个真实的机器人上,即iCub类人体。为了这个目的,我们将方法调整为一个增量设置,在该设置中,机器人在线学习新颖对象。复制实验的代码在GitHub上公开可用。
translated by 谷歌翻译
我们考虑对物体抓住的任务,可以用多种抓握类型的假肢手抓住。在这种情况下,传达预期的抓取类型通常需要高的用户认知负载,可以减少采用共享自主框架。在其中,所谓的眼睛内部系统会根据手腕上的相机的视觉输入自动控制掌握前的手工整形。在本文中,我们提出了一种基于目光的学习方法,用于从RGB序列中进行手部形状分类。与以前的工作不同,我们设计了该系统,以支持以不同的掌握类型掌握每个被认为的对象部分的可能性。为了克服缺乏此类数据并减少对训练系统繁琐的数据收集会话的需求,我们设计了一条呈现手动轨迹合成视觉序列的管道。我们开发了一种传感器的设置,以获取真正的人类握把序列以进行基准测试,并表明,与实际数据相比,使用合成数据集训练的实用案例相比,与对真实数据培训的模型相比,使用合成数据集训练的模型获得了更好的概括性能。我们最终将模型整合到Hannes假肢手中,并显示其实际有效性。我们使代码和数据集公开可用,以复制提出的结果。
translated by 谷歌翻译
在机器人和计算机视觉社区中,6D对象姿态跟踪已被广泛研究。最有前途的解决方案,利用深度神经网络和/或过滤和优化,在标准基准上表现出显着的性能。然而,为了我们的最佳知识,这些尚未对快速的对象动作彻底进行测试。在这种情况下跟踪性能显着降低,特别是对于未实现实时性能并引入不可忽略的延迟的方法。在这项工作中,我们介绍了RGB-D图像流的6D对象姿势和速度跟踪的卡尔曼滤波方法。通过利用实时光流,Roft使低帧速率卷积神经网络的延迟输出与RGB-D输入流的实例分段和6D对象姿态估计实现快速和精确的6D对象姿势和速度跟踪。我们在新引入的照片型数据集中测试我们的方法,Fast-YCB,包括来自YCB模型集的快速移动对象,以及对象的数据集和手动姿势估计HO-3D。结果表明,我们的方法优于6D对象姿势跟踪的最先进方法,同时还提供6D对象速度跟踪。显示实验的视频作为补充材料提供。
translated by 谷歌翻译
Computational units in artificial neural networks follow a simplified model of biological neurons. In the biological model, the output signal of a neuron runs down the axon, splits following the many branches at its end, and passes identically to all the downward neurons of the network. Each of the downward neurons will use their copy of this signal as one of many inputs dendrites, integrate them all and fire an output, if above some threshold. In the artificial neural network, this translates to the fact that the nonlinear filtering of the signal is performed in the upward neuron, meaning that in practice the same activation is shared between all the downward neurons that use that signal as their input. Dendrites thus play a passive role. We propose a slightly more complex model for the biological neuron, where dendrites play an active role: the activation in the output of the upward neuron becomes optional, and instead the signals going through each dendrite undergo independent nonlinear filterings, before the linear combination. We implement this new model into a ReLU computational unit and discuss its biological plausibility. We compare this new computational unit with the standard one and describe it from a geometrical point of view. We provide a Keras implementation of this unit into fully connected and convolutional layers and estimate their FLOPs and weights change. We then use these layers in ResNet architectures on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof, obtaining performance improvements over standard ResNets up to 1.73%. Finally, we prove a universal representation theorem for continuous functions on compact sets and show that this new unit has more representational power than its standard counterpart.
translated by 谷歌翻译
Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.
translated by 谷歌翻译
A systematic review on machine-learning strategies for improving generalizability (cross-subjects and cross-sessions) electroencephalography (EEG) based in emotion classification was realized. In this context, the non-stationarity of EEG signals is a critical issue and can lead to the Dataset Shift problem. Several architectures and methods have been proposed to address this issue, mainly based on transfer learning methods. 418 papers were retrieved from the Scopus, IEEE Xplore and PubMed databases through a search query focusing on modern machine learning techniques for generalization in EEG-based emotion assessment. Among these papers, 75 were found eligible based on their relevance to the problem. Studies lacking a specific cross-subject and cross-session validation strategy and making use of other biosignals as support were excluded. On the basis of the selected papers' analysis, a taxonomy of the studies employing Machine Learning (ML) methods was proposed, together with a brief discussion on the different ML approaches involved. The studies with the best results in terms of average classification accuracy were identified, supporting that transfer learning methods seem to perform better than other approaches. A discussion is proposed on the impact of (i) the emotion theoretical models and (ii) psychological screening of the experimental sample on the classifier performances.
translated by 谷歌翻译
This volume contains revised versions of the papers selected for the third volume of the Online Handbook of Argumentation for AI (OHAAI). Previously, formal theories of argument and argument interaction have been proposed and studied, and this has led to the more recent study of computational models of argument. Argumentation, as a field within artificial intelligence (AI), is highly relevant for researchers interested in symbolic representations of knowledge and defeasible reasoning. The purpose of this handbook is to provide an open access and curated anthology for the argumentation research community. OHAAI is designed to serve as a research hub to keep track of the latest and upcoming PhD-driven research on the theory and application of argumentation in all areas related to AI.
translated by 谷歌翻译
We analyze the problem of detecting tree rings in microscopy images of shrub cross sections. This can be regarded as a special case of the instance segmentation task with several particularities such as the concentric circular ring shape of the objects and high precision requirements due to which existing methods don't perform sufficiently well. We propose a new iterative method which we term Iterative Next Boundary Detection (INBD). It intuitively models the natural growth direction, starting from the center of the shrub cross section and detecting the next ring boundary in each iteration step. In our experiments, INBD shows superior performance to generic instance segmentation methods and is the only one with a built-in notion of chronological order. Our dataset and source code are available at http://github.com/alexander-g/INBD.
translated by 谷歌翻译
We combine the metrics of distance and isolation to develop the \textit{Analytic Isolation and Distance-based Anomaly (AIDA) detection algorithm}. AIDA is the first distance-based method that does not rely on the concept of nearest-neighbours, making it a parameter-free model. Differently from the prevailing literature, in which the isolation metric is always computed via simulations, we show that AIDA admits an analytical expression for the outlier score, providing new insights into the isolation metric. Additionally, we present an anomaly explanation method based on AIDA, the \textit{Tempered Isolation-based eXplanation (TIX)} algorithm, which finds the most relevant outlier features even in data sets with hundreds of dimensions. We test both algorithms on synthetic and empirical data: we show that AIDA is competitive when compared to other state-of-the-art methods, and it is superior in finding outliers hidden in multidimensional feature subspaces. Finally, we illustrate how the TIX algorithm is able to find outliers in multidimensional feature subspaces, and use these explanations to analyze common benchmarks used in anomaly detection.
translated by 谷歌翻译
People constantly use language to learn about the world. Computational linguists have capitalized on this fact to build large language models (LLMs) that acquire co-occurrence-based knowledge from language corpora. LLMs achieve impressive performance on many tasks, but the robustness of their world knowledge has been questioned. Here, we ask: do LLMs acquire generalized knowledge about real-world events? Using curated sets of minimal sentence pairs (n=1215), we tested whether LLMs are more likely to generate plausible event descriptions compared to their implausible counterparts. We found that LLMs systematically distinguish possible and impossible events (The teacher bought the laptop vs. The laptop bought the teacher) but fall short of human performance when distinguishing likely and unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLMs generalize well across syntactic sentence variants (active vs passive) but less well across semantic sentence variants (synonymous sentences), (iii) some, but not all LLM deviations from ground-truth labels align with crowdsourced human judgments, and (iv) explicit event plausibility information emerges in middle LLM layers and remains high thereafter. Overall, our analyses reveal a gap in LLMs' event knowledge, highlighting their limitations as generalized knowledge bases. We conclude by speculating that the differential performance on impossible vs. unlikely events is not a temporary setback but an inherent property of LLMs, reflecting a fundamental difference between linguistic knowledge and world knowledge in intelligent systems.
translated by 谷歌翻译